A Technique for Segmentation of Gurmukhi Text
نویسندگان
چکیده
This paper describes a technique for text segmentation of machine printed Gurmukhi script documents. Research in the field of segmentation of Gurmukhi script faces major problems mainly related to the unique characteristics of the script like connectivity of characters on the headline, two or more characters in a word having intersecting minimum bounding rectangles, multicomponent characters, touching characters which are present even in clean documents. The segmentation problems unique to the Gurmukhi script such as horizontally overlapping text segments and touching characters in various zonal positions in a word have been discussed in detail and a solution has been proposed.
منابع مشابه
Segmentation Problems and Solutions in Printed Degraded Gurmukhi Script
Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper we have proposed a complete solution for segmenting touching characters in all the three zones of printed Gurmukhi script. A study of touching Gurmukhi cha...
متن کاملOn Segmentation of Touching Characters and Overlapping Lines in Degraded Printed Gurmukhi Script
Character segmentation plays a very important role in a text recognition system. The simple technique of using inter-character gap for segmentation is useful for fine printed documents, but this technique fails to give satisfactory results if the input text contains touching characters. In this paper, we have proposed two algorithms to segment touching characters, and one algorithm to segment o...
متن کاملA Study of Touching Characters in Degraded Gurmukhi Text
Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper a study of touching Gurmukhi characters is carried out and these characters have been divided into various categories after a careful analysis. Structural ...
متن کاملA Script Independent Technique for Extraction of Characters from Handwritten Word Images
A script independent character segmentation from word images technique has been reported here. Word to character segmentation is an important preprocessing step of optical character recognition process. But in case of handwritten text, presence of touching characters decreases the accuracy of the technique of the segmentation of the characters from the word. In this paper, segmentation of handw...
متن کاملConversion between Scripts of Punjabi: Beyond Simple Transliteration
This paper describes statistical techniques used for modelling transliteration systems between the scripts of Punjabi language. Punjabi is one of the unique languages, which are written in more than one script. In India, Punjabi is written in Gurmukhi script, while in Pakistan it is written in Shahmukhi (Perso-Arabic) script. Shahmukhi script has its origin in the ancient Phoenician script wher...
متن کامل